2016-11-13

Data Tidying

Raw Data

  • We received one extract from your SAP production system (in total 10487 records).
  • Columns considered for postal validation are STRASSE, PLZ, STADT.
  • The columns used for deduplication are VORNAME, NAME, STRASSE, PLZ, STADT.
  • The total amount of data processed by postal validation and identity is 10487

Let's delve into a few special cases that have been discovered during the analysis of the raw input data:

Data Preparation

The completeness of column content is relevant for the outcome of postal validation and matching.

Postal Validation

Where is your customer located?

Can you reach your customer?

Distribution of result classes

Result class 4 means the address is ambiguous, result class 5 means the address could not be found.

Matching

Do you know your customer?

Red dots indicate duplicate groups. The bigger the dot, the bigger the group.

Frequency of duplicates

Uniservs matching engine is able to execute a pairwise record comparison based on an underlying, configurable ruleset. The outcome of the matching results in so-called cleans and duplicates.

Distribution of duplicate groups

Summary

Recommendations

  • Your data is in decent / mediocre / bad shape.
  • Almost 15 % of your postal activities do not reach the intended recipient.
  • We can help you by …